智能论文笔记

SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL

Ruichu Cai , Jinjie Yuan , Boyan Xu , Zhifeng Hao

分类：自然语言处理

2021-11-01

Text-to-sql任务，旨在将问题的自然语言转化为SQL查询，最近引起了很多关注。 Text-to-SQL最具挑战性的问题之一是如何将培训的模型概括为未遵守的数据库模式，也称为跨域文本到SQL任务。关键在于（i）编码方法的概括性，以模拟问题和数据库模式和（ii）问题模式链接方法，以了解数据库模式中问题和表/列之间的单词之间的映射。专注于上述两个关键问题，我们提出了一个用于跨域文本到SQL的结构感知双图形聚合网络（Sadga）。在Sadga中，我们采用图形结构为自然语言问题和数据库模式提供统一的编码模型。基于所提出的统一建模，我们进一步设计了一个结构感知聚合方法，以了解问题图和架构图之间的映射。结构感知聚合方法具有全局图链接，本地图链接和双图聚合机制。我们不仅研究了我们的提案的表现，而且还在撰写本文时挑战挑战文本到SQL基准蜘蛛的第3位。

translated by 谷歌翻译

Aesthetics Driven Autonomous Time-Lapse Photography Generation by Virtual and Real Robots

Xiaobo Gao , Qi Kuang , Xin Jin , Bin Zhou , Boyan Dong , Xunyu Wang

分类：计算机视觉

2022-08-22

延时摄影是在电影和宣传电影中使用的，因为它可以在短时间内反映时间的流逝并增强视觉吸引力。但是，由于需要很长时间才需要稳定的射击，因此对摄影师来说是一个巨大的挑战。在本文中，我们提出了一个带有虚拟和真实机器人的延时摄影系统。为了帮助用户有效拍摄延时视频，我们首先参数化延时摄影并提出参数优化方法。对于不同的参数，使用不同的美学模型，包括图像和视频美学质量评估网络，用于生成最佳参数。然后，我们提出了一个延时摄影界面，以促进用户查看和调整参数，并使用虚拟机器人在三维场景中进行虚拟摄影。该系统还可以导出参数并将其提供给真实的机器人，以便可以在现实世界中拍摄延时视频。此外，我们提出了一种延时摄影美学评估方法，该方法可以自动评估及时视频的美学质量。实验结果表明，我们的方法可以有效地获得延时视频。我们还进行了用户研究。结果表明，我们的系统具有与专业摄影师相似的效果，并且更有效。

translated by 谷歌翻译

LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling

Boyan Jiang , Xinlin Ren , Mingsong Dou , Xiangyang Xue , Yanwei Fu , Yinda Zhang

分类：计算机视觉

2022-08-18

4D隐式表示中的最新进展集中在全球控制形状和运动的情况下，低维潜在向量，这很容易缺少表面细节和累积跟踪误差。尽管许多深层的本地表示显示了3D形状建模的有希望的结果，但它们的4D对应物尚不存在。在本文中，我们通过提出一个新颖的局部4D隐性代表来填补这一空白，以动态穿衣人，名为Lord，具有4D人类建模和局部代表的优点，并实现具有详细的表面变形的高保真重建，例如衣服皱纹。特别是，我们的主要见解是鼓励网络学习本地零件级表示的潜在代码，能够解释本地几何形状和时间变形。为了在测试时间进行推断，我们首先估计内部骨架运动在每个时间步中跟踪本地零件，然后根据不同类型的观察到的数据通过自动编码来优化每个部分的潜在代码。广泛的实验表明，该提出的方法具有强大的代表4D人类的能力，并且在实际应用上胜过最先进的方法，包括从稀疏点，非刚性深度融合（质量和定量）进行的4D重建。

translated by 谷歌翻译

Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

Quan Cui , Bingchen Zhao , Zhao-Min Chen , Borui Zhao , Renjie Song , Jiajun Liang , Boyan Zhou , Osamu Yoshie

分类：计算机视觉 | 人工智能

2022-03-08

这项工作同时考虑了典型的监督学习任务中深度表示的可区分性和可传递性属性，即图像分类。通过全面的时间分析，我们观察到这两个属性之间的权衡。随着培训的进展，可区分性不断提高，而转移性在后来的培训期间大大降低。从信息 - 底层理论的角度来看，我们揭示了可区分性和可传递性之间的不相容性归因于输入信息的过度压缩。更重要的是，我们研究了为什么和为什么如何减轻过度压缩的信息，并进一步提出一个学习框架，称为对比度的时间编码〜（CTC），以抵消过度压缩并减轻不相容性。广泛的实验验证了CTC成功缓解了不相容性，从而产生了歧视性和可转移表示形式。在图像分类任务和挑战转移学习任务上实现了明显的改进。我们希望这项工作将提高传统监督学习环境中可转移性属性的重要性。代码可从https://github.com/dtennant/dt-tradeoff获得。

translated by 谷歌翻译

Data blurring: sample splitting a single sample

James Leiner , Boyan Duan , Larry Wasserman , Aaditya Ramdas

分类： (统计)机器学习

2021-12-21

假设我们观察一个随机向量$ x $从一个具有未知参数的已知家庭中的一些分发$ p $。我们问以下问题：什么时候可以将$ x $分为两部分$ f（x）$和$ g（x）$，使得两部分都足以重建$ x $自行，但两者都可以恢复$ x $完全，$（f（x），g（x））$的联合分布是贸易的吗？作为一个例子，如果$ x =（x_1，\ dots，x_n）$和$ p $是一个产品分布，那么对于任何$ m <n $，我们可以将样本拆分以定义$ f（x）=（x_1 ，\ dots，x_m）$和$ g（x）=（x_ {m + 1}，\ dots，x_n）$。 Rasines和Young（2021）提供了通过使用$ x $的随机化实现此任务的替代路线，并通过加性高斯噪声来实现高斯分布数据的有限样本中的选择后推断和非高斯添加剂模型的渐近。在本文中，我们提供更一般的方法，可以通过借助贝叶斯推断的思路在有限样本中实现这种分裂，以产生（频繁的）解决方案，该解决方案可以被视为数据分裂的连续模拟。我们称我们的方法数据模糊，作为数据分割，数据雕刻和P值屏蔽的替代方案。我们举例说明了一些原型应用程序的方法，例如选择趋势过滤和其他回归问题的选择后推断。

translated by 谷歌翻译

ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources

Quan Cui , Boyan Zhou , Yu Guo , Weidong Yin , Hao Wu , Osamu Yoshie

分类：计算机视觉

2021-12-17

开创性双编码器预训练工作（例如，剪辑并对齐）揭示了与对比学习对齐多模态表示的潜力。然而，这些作品需要大量的数据和计算资源（例如，十亿级Web数据和数百个GPU），这阻止了从再生产和进一步探索的资源有限的研究人员。为此，我们探讨了一堆简单但有效的启发式，并提供了全面的培训指导，使我们能够与有限的资源进行双编码器多模态表示对齐。我们为竞争结果提供可重复的强大基线，即Zerovl，只有1400万公共访问的学术数据集和8 v100 GPU。此外，我们收集100米Web数据进行预培训，而不是最先进的方法实现可比或优越的结果，进一步证明了我们对大规模数据的方法的有效性。我们希望这项工作将为多模态预培训的未来研究提供有用的数据点和经验。我们的代码和预先训练的型号将被释放，以促进研究界。

translated by 谷歌翻译

Lexicon-constrained Copying Network for Chinese Abstractive Summarization

Boyan Wan , Mishal Sohail

分类：自然语言处理

2020-10-16

复制机制允许序列到序列模型从输入中选择单词并将它们直接放入输出中，这在抽象总结中发现越来越多的使用。但是，由于汉语句子中没有明确的分隔符，所以最现有的中国抽象摘要模型只能执行字符副本，从而导致效率低下。为了解决这个问题，我们提出了一个词典约束的复制网络，在编码器和解码器中模拟多粒度。在源端，单词和字符使用变换器基编码器聚合到相同的输入存储器中。在目标方面，解码器可以在每个时间步骤复制字符或多字符字，并且解码过程由一个词增强的搜索算法引导，其促进并行计算并鼓励模型复制更多单词。此外，我们采用单词选择器来集成关键字信息。实验结果在中国社交媒体数据集显示我们的模型可以独立或使用单词选择器。这两种形式都可以胜过以前的基于角色的模型并实现竞争性表现。

translated by 谷歌翻译

BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Boyan Zhou , Quan Cui , Xiu-Shen Wei , Zhao-Min Chen

分类：

2019-12-05

Our work focuses on tackling the challenging but natural visual recognition task of long-tailed data distribution (i.e., a few classes occupy most of the data, while most classes have rarely few samples). In the literature, class re-balancing strategies (e.g., re-weighting and re-sampling) are the prominent and effective methods proposed to alleviate the extreme imbalance for dealing with long-tailed problems. In this paper, we firstly discover that these rebalancing methods achieving satisfactory recognition accuracy owe to that they could significantly promote the classifier learning of deep networks. However, at the same time, they will unexpectedly damage the representative ability of the learned deep features to some extent. Therefore, we propose a unified Bilateral-Branch Network (BBN) to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately. In particular, our BBN model is further equipped with a novel cumulative learning strategy, which is designed to first learn the universal patterns and then pay attention to the tail data gradually. Extensive experiments on four benchmark datasets, including the large-scale iNaturalist ones, justify that the proposed BBN can significantly outperform state-of-the-art methods. Furthermore, validation experiments can demonstrate both our preliminary discovery and effectiveness of tailored designs in BBN for long-tailed problems. Our method won the first place in the iNaturalist 2019 large scale species classification competition, and our code is open-source and available at https://github.com/Megvii-Nanjing/BBN . * Q. Cui and Z.-M. Chen's contribution was made when they were interns in Megvii Research Nanjing, Megvii Technology, China. X.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

AI in HCI Design and User Experience

Wei Xu

分类：人工智能

2023-01-03

In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.

translated by 谷歌翻译